One major bottleneck in the boot time of Amazon EC2 instances is the initial loading of data blocks from S3 to the EBS root volume, which can be sped up by pre-warming the volume by booting and stopping the instance once. Another optimization in the boot process is using cheaper instance types for warming and resizing the instance before starting. These optimizations helped this author reduce boot times from 40 seconds to 5 seconds.
Friday, May 24, 2024Amazon's Business Data Technologies (BDT) team is migrating from Apache Spark to Ray for large-scale data processing to reduce costs and improve efficiency. The migration was driven by the need to address scaling and performance issues with their existing Apache Spark compactor. The migration itself involved rigorous testing and validation to ensure data quality and reliability.